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Abstract: Bayesian Networks (BNs) are popular graphical models 
for the representation of statistical problems embodying dependence 
relationships between a number of variables. Much of this popularity 
is due to the d-separation theorem of Pearl and Lauritzen, which al¬ 
lows an analyst to identify the conditional independence statements 
that a model of the problem embodies using only the topology of 
the graph. However for many problems the complete model de¬ 
pendence structure cannot be depicted by a BN. The Chain Event 
Graph (CEG) was introduced for these types of problem. In this pa¬ 
per we introduce a separation theorem for CEGs, analogous to the 
d-separation theorem for BNs, which likewise allows an analyst to 
identify the conditional independence structure of their model from 
the topology of the graph. 
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1 Introduction 

If the DAG (directed acyclic graph) of a Bayesian Network (BN) has a vertex set 
{A'i, X 2 ,..., A n }, then there are n conditional independence assertions which 
can simply be read off the graph. These are the properties that state that 
a vertex-variable is independent of its non-descendants given its parents (the 
directed local Markov property [20]). Answering most conditional independence 
queries however, is not so straightforward. The d-separation theorem for BNs 
was first proved by Verma and Pearl [35] , and an alternative version considered 
in [22] [20] [9]. The theorem addresses whether the conditional independence 
query A H B \ Cl can be answered from the topology of the DAG of a BN, 
where A, B,C are disjoint subsets of the set of vertex-variables of the DAG. 
Separation theorems have been proved for more general classes of graphical 
model including chain graphs [8], alternative chain graphs [2], and ancestral 
graphs [55 1. 

However, for many problems the available quantitative dependence infor¬ 
mation cannot all be embodied in the DAG of a BN. The Chain Event Graph 
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(CEG) was introduced in 2006 [Mi EB] for the representation & analysis of 
precisely these sorts of problems. There have been a dozen papers on CEGs 
published since then, principally concerned with their use for problem rep¬ 
resentation (eg. JJ), probability propagation [IT] , learning and model selec¬ 
tion and causal analysis [321 EEQ ■ The motivation for the 

development of this class is that CEGs are probably the most natural graphical 
models for discrete processes when elicitation involves questions about how situ¬ 
ations might unfold. Although the topology of these graphs is more complicated 
than that of the BN, they are more expressive, as they allow us to represent 
all structural quantitative information within the graph itself. Context-specific 
symmetries which are not intrinsic to the structure of the BN [3 [23 HE EH 
are fully expressed in the topology of the CEG, which also recognises logical 
or structural zeros in probability tables, and the numbers of levels taken by 
problem-variables. This last has been found to be essential to understanding 
the geometry of BN models with hidden variables jl, M] ■ In this paper we re¬ 
turn to the mathematics underpinning CEG models, and provide a separation 
theorem for these graphs. 

The CEG is a tree-based graphical structure with a passing resemblance to 
graphs such as Bozga & Malers’ probabilistic decision graph 0 (made popular 
by Jaeger et al in ED- It differs from these in that edges in a CEG label 
events that might happen to an individual in a population given a particular 
partial history, and the coalescing of vertices & colouring of edges together en¬ 
code conditional independence/Markov structure. The colouring of CEGs and 
their acyclicity also distinguish them from Markov state space diagrams. Finite 
CEGs as discussed in this paper also have finite event spaces whose atoms cor¬ 
respond to the distinct possible histories or developments that individuals in a 
population might have. The tree-structure imparts to these atoms an additional 
longitudinal element consisting of the stages of an individual’s development. We 
note in passing that colour has recently been found to provide a valuable em¬ 
bellishment to other graphical models (see for example USD- 

Even more so than is the case with BNs, there are a number of conditional 
independence properties which can simply be read off the CEG [M|, and given 
the tree-based nature of the CEG these properties are naturally context-specific. 
That is to say they are properties of the form A II B | A for some event A. An 
example would be that a particular lifestyle-related medical condition is inde¬ 
pendent of gender given that the subject is a smoker. An analogous statement 
for a discrete BN would be of the form 

p{A | B,C = c) =p(A\ C = c) 

for some subsets of variables A, B, C , some specific vector value c of C and 
all vector values of A and B. The class of conditioning events we can tackle 
with a CEG is however much richer than that generally considered when using 
BN-based analysis. 

In Section 2 we use a toy example to introduce CEGs. A naive criticism 
of tree-based graphical structures is that they will be too complex for larger 
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problems. We note that the picture is simply for the investigator’s (or a client’s) 
benefit: as with any large system, analysts need to consider both local and global 
aspects - the full CEG for a large problem may exist only as a set of computer 
constraints; local aspects of the problem can be drawn out as a simple graph. 
Our example here is small so that we can use it effectively to illustrate key ideas. 

We then formally define a CEG, and explain how coalescence & colouring 
encode conditional independence structure. Events & random variables defined 
on CEGs are introduced through our example, as are sub-CEGs conditioned on 
an event of interest. 

In Section 3 we introduce elementary variables associated with the vertices 
of the CEG, and use these to construct a separation theorem. Comparison with 
the d-separation theorem for BNs is made through corollaries and our running 
example. Section 4 develops some of the ideas from earlier sections. 

2 Chain Event Graphs 

Definitions of Chain Event Graphs (of varying degrees of complexity) have ap¬ 
peared in many of the previous papers on these graphs. We offer a detailed 
formal definition here so that the theorems in later sections have a firm mathe¬ 
matical foundation. 

2.1 Event Trees 

We introduce CEGs in this section through the use of a toy example, simple 
enough to illustrate the key ideas. 

The CEG is a function of an event tree [32], and was created to overcome 
some of the shortcomings of these graphs. So we start by considering an event 
tree elicited from some expert. 

Example 1. A researcher is investigating a population of people whose parents 
sufferered from an inherited medical condition C. She has information on the 
gender of each individual; if and when they displayed a symptom S (never, before 
puberty, after puberty); and whether or not they developed the condition C. 
Her current research is retrospective so she also has these individual’s ages at 
death. She suspects that the condition can lead to early death, so she produces 
an indicator that for each individual records whether or not they died before the 
age of 50. She knows that an individual who does not display symptom S will 
not develop the condition. 

An event tree for this information is given in Figure 1. 

The tree is a natural description of the problem. The labels on the edges 
of each root-to-leaf path (eg. male, display S before puberty, develop C, die 
before 50) in Figure 1 follow a temporal order, and the absence of condition 
edges in the 9th, 10th, 19th & 20th such paths reflects the expert’s knowledge 
that individuals who do not display S do not develop C. 
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Figure 1: Event tree for Example 1 


An event tree T is a connected directed graph with no cycles. It has one root 
vertex (uo) with no parents, whilst all other vertices have exactly one parent. A 
leaf-vertex is a vertex with no children. We denote the vertex set of T by V (T) 
and the edge set by E(T). A directed root-to-leaf path in T is called a route. 

Although an event tree could be used to represent an observer’s beliefs about 
the possible developments of some individual, we make the assumption that the 
tree relates to a population. Hence the routes of the tree describe precisely 
the possible developments or histories that an individual in the population can 
experience. This description takes the form of a sequence of edge-labels, each 
describing what can happen next at a vertex. So in Figure 1 for example, an 
individual who is male reaches vertex vi where the possible immediate develop¬ 
ments are that he displays S before puberty, after puberty or not at all. 

We specify that the edges leaving any vertex in the tree must have distinct 
labels; that each individual can only pass along one edge leaving any vertex, 
and the choice of which edge is determined only by the variable controlling 
the next stage of development (eg. symptom) and not by any possible further 
developments downstream of these edges (ie. towards the leaves). 

We also require that each route corresponds to a real possible development 
or history of an individual in the population. So each such path has a non¬ 
zero probability that some individual might take this path. Also, the number 
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of routes corresponds exactly to the number of distinct possible histories or 
developments (defined by the edge-labels) that some individual could experience. 

Once we have a set of routes, and some ordering on these paths, then the 
edge-labels define the tree structure. In our example the first variable in our 
order is gender, so vq has two emanating edges, labelled male & female. The 
second variable is symptom , so v± & V 2 both have three emanating edges, labelled 
displays S before puberty, displays S after puberty & never displays S. We know 
that individuals who never display S will not develop condition C, so the edges 
emanating from v$ & v$ label the possible values of the life-expectancy indicator, 
whereas those emanating from V3, V4, vq & V7 label the possible values of 
condition. 

In this paper we use the notation A to denote a route, and the set of routes 
of T is labelled A(T). When the tree is applied to a population, each route A 
corresponds to a possible history or development of an individual in the popu¬ 
lation, and hence to an atom in an event space defined by the tree. The sigma 
field of events associated with T is then the set of all possible unions of atoms 
A in A(T). Note that the tree encodes an additional longitudinal development 
or history for the individual, not encoded by the sigma field alone [32]. Events 
in the sigma field of the tree are denoted A. 

So for instance, in Example 1 the event A corresponding to displayed S before 
puberty and died before the age of 50 is simply the union of the 1st, 3rd, 11th 
& 13th routes in the tree in Figure 1. 

Example 1 continued. Our researcher has done sufficient analysis of the data 
to tell us that: 

• life expectancy of individuals in this population is independent of gender 
given that S is not displayed, 

• males who display S at any point and females who display S before puberty 
have the same joint probability distribution over the variables condition and life 
expectancy. 

Moreover 

• if and when an individual displays S is independent of gender, 
and she believes that 

• males and females who display S at any point have the same probability of 
developing the condition. 

It is the fact that traditional trees cannot readily depict this sort of informa¬ 
tion which has led to tree-based analysis not receiving the attention it deserves. 
It is actually relatively easy to portray these types of conditional independence 
or Markov properties on a tree - all we need to do is add colour to the edges, as 
in Figure 2 (where edges with the same colouring carry the same probability). 

Despite the colouring this is still a rather cumbersome representation. To 
make it more compact we use the idea of coalesced trees, used in decision anal¬ 
ysis since m- In a coalesced event tree vertices from which the sets of possible 
complete future developments have the same probability distribution are coa¬ 
lesced. 
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Figure 2: Coloured event tree for Example 1 


So in the tree in Figure 2 we can coalesce the vertices V 3 , V 4 & ve and also the 
vertices V 5 & vs (the vertices vg,vn & V 13 and vio,vi 2 & V 14 are also coalesced, 
but this coalescence is in a sense absorbed into that of i' 3 ,Ui & ug). 

The combination of colouring and coalescence gives us a more compact graph 
that allows us to portray all conditional independence properties of the type 
described in Example 1 above. 

So in Figure 3, the first two of the four statements provided by our researcher 
are depicted by the coalescence, but the latter two require the colouring of the 
edges leaving 711 & v 2 and 773 + 4+6 & vj. The colouring of the edges emanating 
from V9+11+13 & from U10+12+14 is suppressed as it no longer yields any extra 
information. 

2.2 Probabilities on Trees 

In section 2.1 we talked in general terms about probabilities on trees. In this 
section we formalise these ideas. 

In Figure 1 the probability of the atom {male, displayed S before puberty, de- 
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Figure 3: Coalesced event tree for Example 1 


veloped C, died before 50} is clearly: 

p(male) x ^(displayed S before puberty | male) 

x ^(developed C | male, displayed S before puberty) 
x p(died before 50 | male, displayed S before puberty, developed C) 

which can be written as 

TTeOi I v 0 ) n e (v 3 I Vi) 7r e (vg | v 3 ) n e (v 17 I v g ) 

where 7 r e (vi 7 | vg) is the probability of an individual having reached the vertex vg 
in Figure 1 (ie. they are male, displayed S before puberty & developed C) then 
taking the edge e(vg,v 77 ) to reach the vertex v 77 (ie. they die before 50) etc. 

We can assign a probability to each atom of the event space as: 

PW = n ne ( v ' I v ) 

e(v,v')£\ 


where e{v,v') means the edge from vertex v to vertex v', e(y,v') £ A means 
that e(v,v r ) lies on the route A, and n e (v' \ v) is the conditional probability of 
traversing the edge e(v,v') given that have reached the vertex v. 

We call the probabilities {n e (v' | c)} the primitive probabilities of the tree T. 
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The set (p(A)} defines a probability measure over the sigma field of events 
formed by the atoms A G A (T). Strictly speaking these probabilities are the 
fundamental probabilities of the system as they are the probabilities of the 
atoms. Each primitive probability has then a unique value determined by these 
{p(A)}. The conditional probability of an edge e{v,v') is given by: 


7T e (v' | V ) 


P( A) 

E\:vexPW 


where the numerator is the sum of the probabilities of all routes utilising the edge 
e(v, v'), and the denominator is the sum of the probabilities of all routes passing 
through the start-vertex v of the edge e(v,v'). So for example in Figure 1 we 
have: 

i , ^ P{ Ai) 

MVl7 h>9) = \ , -7T-W 

P(Al) +p(A 2 ) 

where Ai is the atom corresponding to the route Vo —> vn (A(uo, U 17 )) and A 2 
is the atom corresponding to the route Vo —> iqg (A(uo, fig)). 

In practice however, our elicitation of the tree is likely to yield primitive 
probabilities of the sort described above, rather than probabilities of atoms. 


2.3 Positions and Stages 

To allow our event tree to encode the full conditional independence structure of 
the model we introduce two partitions of the tree’s vertices. 

Let V°(T) (C V(T)) be the set of non-leaf vertices of T (called situations 
in [32] ) • Also let v -< v' denote that the vertex v precedes the vertex v' on some 
route. 

Then for any non-leaf vertex v a € V°(T) and leaf-vertex v” € V (T) \ V°(T) 
such that v a -< v ", there is a unique subpath /j,(v a ,v") comprising of the edges 
of the route A(i>o ,v") which lie between the vertices v a and v". 

Let: 

TT/xK' I v a ) = 7 T e (v' | v) 

e(v ,v') £ fi(v a jv ") 

Now each vertex v € V°(T) labels a random variable J(v) whose state space 
JJ(u) can be identified with the set of v —> leaf subpaths v")}. 

Definition 1. Positions. For an Event Tree T{y(fT),E{T)), the set V°(T) 
is partitioned into equivalence classes, called positions as follows: 

Vertices v a ,Vb G P°(T) are members of the same equivalence class (position) if 
there is a bijection <f> between J(u a ) and J(uf,) such that if 
4> : p(v a ,v") n(vb,v%), then 

(a) the ordered sequence of edge-labels is identical for /i(u a ,u") and for 

(b) I Va) = I v b ). 

Now, from section 2.1, tree structure is defined by the edge-labels, so (a) 
above means that the subtrees rooted in v a and Vb have identical topological 
structure. 
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Similarly, from section 2.2, our edge probabilities are uniquely defined by the 
route probabilities. We can see that the edge probabilities in the subtrees rooted 
in v a and Vb must be uniquely defined by the sets of probabilities {^(u" | i> 0 )} 
and | Vb)}. So (b) above means that the corresponding edge probabilities 

in these two subtrees are equal. 

So two vertices in a tree are in the same position if the sets of possible 
complete future developments from these vertices have the same probability 
distribution. We denote the set of positions of T by P(T) 

We noted earlier that knowing this partition of vertices is insufficient for 
us to fully describe the conditional independence structure of the tree, so we 
introduce a second partition. 

Each vertex v G V°(T) also labels a random variable K(v) whose state space 
K(u) can be identified with the set of directed edges e{v, v') emanating from v. 

Definition 2. Stages. For an Event Tree T(U(T), E(T)), the set V°(T) is 
partitioned into equivalence classes, called stages as follows: 

Vertices v a ,Vb G V°(T) are members of the same equivalence class (stage) if 
there is a bijection ip between K(u a ) and K(i>b) such that if 
ip : e(v a , v' a ) i-)- e(v b ,v b ), then n e (v' a \ v a ) = n e (v b \ v b ). 

So two vertices in a tree are in the same stage if their sets of emanating edges 
have the same probability distribution. 

Note that the set of stages is coarser than the set of positions, and that 
vertices in the same position are necessarily in the same stage. 

We also add colouring to trees to illustrate the stage structure. So vertices 
in the same stage are given the same colour, and edges emanating from vertices 
in the same stage are coloured according to their probabilities / labels. This 
induces a partition on E(T). 

2.4 Chain Event Graphs 

The Chain Event Graph C is a directed acyclic graph (DAG), which is connected, 
having a unique root vertex (with no incoming edges) and a unique sink vertex 
(with no outgoing edges). Unlike the BN more than one edge can exist be¬ 
tween two vertices of a CEG. The CEG also generally has its vertices and edges 
coloured, although most of this paper will deal with uncoloured versions called 
Simple CEGs. The root and sink vertices of a CEG are labelled w o and Woo- 

Definition 3. Chain Event Graph. The CEG C(T) (a function of the tree 
T(V(T),E(T))) is the graph with vertex set V(C ) and edge set E(C) defined by: 

1. V(C) = P(T)U{ Woo }; 

2. (a) For w,w' G U(C)\{ , u; 00 } there is a directed edge e(w,w') G E{C ) 

iff there are vertices v,v' G U°(T) such that the vertex v is in the 
position w (G P(T)), v' is in the position w' (G P{T)), and there is 
an edge e{v,v') G E(T); 
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Figure 4: CEG for Example 1 


(b) For w G E(C)\{ui 00 } there is a directed edge e(w,w 00 ) G E(C) 
iff there is a vertex v G V°(T~) such that v is in the position w 
(G P(T)), and there is an edge e{v,v') G E(T) for some leaf-vertex 
v' G V(T)\V°(T). 

Note that the vertex set of C(T) consists of the positions of T and the sink- 
vertex Woo. Positions in C(T) are said to be in the same stage if the component 
vertices (in T) of these positions are in the same stage. Colouring in C(T) is 
inherited from T. The constraints associated with the positions & stages of a 
CEG hold for the entire population to which the CEG has been applied, 

Example 1 continued. To convert the coalesced tree from Figure 3 to a CEG 
is straightforward. We simply combine the leaf-vertices into a sink-vertex Woo 
as in Figure 4- 

The positions here are wq through wg. We note that w i & W 2 are in the 
same stage (as can be seen from the colouring), W 3 & W 4 are in the same stage, 
and each of W 5 through wg is in a stage by itself. 

The position W 3 encodes the conditional independence / Markov property 
that males who display S at any point and females who display S before puberty 
have the same joint probability distribution over the variables condition and life 
expectancy. The position W 5 encodes the property that life expectancy of indi¬ 
viduals in this population is independent of gender given that S is not displayed. 
The stage encodes the property that if and when an individual dis- 
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plays S is independent of gender. The stage encodes the property that 

condition is independent of gender given S displayed. 

Without ambiguity we simplify our notation C(T) to C. 

Analogously with the tree, a directed w o —> path A in C is called a route. 
The set of routes of C is labelled A(C). We write w -< w' when the position w 
precedes the position w' on a route. 

When the CEG is applied to a population, each route A corresponds to a 
possible history or development of an individual in the population, and hence 
to an atom in the event space defined by the CEG. The sigma field of events 
associated with C is then the set of all possible unions of atoms A in A(C). Like 
the tree, the CEG encodes an additional longitudinal development or history 
for the individual, not encoded by the sigma field alone. Events in the sigma 
field of the CEG are denoted A. 

Note that the number of routes in the CEG equals the number in the tree and 
corresponds exactly to the number of possible distinct histories or developments 
that some individual in the population could experience. And since no route has 
a zero probability, all edges in the CEG have non-zero conditional probabilities 
associated with them. 

Because the CEG’s atoms have this implicit longitudinal development asso¬ 
ciated with them, certain events in the sigma field are particularly important. 
Let A (w) denote the event that an individual unit takes a route that passes 
through the position w £ V(C). A(w,w') is then the union of all routes passing 
through the positions w and w ’, A (e(w,w')) is the union of all routes passing 
through the edge e(u>,w'), and A (fx(w,w')) is the union of all routes utilising 
the subpath /a(w,w'). 

2.5 Probabilities on CEGs 

As with trees, underlying the CEG there is a probability space which is specified 
by assigning probabilities to the atoms. For each position w £ E(C)\{(Uoo} 
and edge e(w,w') emanating from w, we denote by TT e (w' \ w) the probability 
of traversing the edge e(w,w') conditional on having reached the position w. 
We call the probabilities {7r e (u/ ] w) : e(w,w r ) £ E(C),w £ the 

primitive probabilities of C. 

Then, analogously with trees, for each atom A: 

P( A) = TTeO' I w) 

e(w,w')G A 

as both the atoms and the primitive probabilities are identical to the corre¬ 
sponding atoms and primitive probabilities in the tree. 

The set (p(A)} defines a probability measure over the sigma field of events 
formed by the atoms A £ A(C). 

This assignment of probabilities implicitly demands a Markov property over 
the flow of the units through the graph. Thus, in the context of our medi¬ 
cal example, the probablility of an individual with attributes (male, displayed 
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symptom S before puberty ), {male, displayed symptom S after puberty ) or {fe¬ 
male, displayed symptom S before puberty) developing the condition depends 
only on the fact that the subpaths corresponding to these pairs of attributes 
terminate at the position W 3 , and not on the particular subpath leading to W 3 . 
The probability this individual develops the condition is then n e {we \ W 3 ) = 
p{A{e{w 3 ,we)) | A{w 3 )). So we only need to know the position a unit has 
reached in order to predict as well as is possible what the next unfolding of its 
development will be. 

This Markov hypothesis looks strong but in fact holds for many families of 
statistical model. For example all event tree descriptions of a problem satisfy 
this property, all finite state space context specific Bayesian Networks as well 
as many other structures [34] . 

We can go further and state that the sets of possible future developments 
(whether or not they developed the condition and whether or not they died 
before the age of 50) for individuals taking any of these three subpaths must 
be the same. Moreover the conditional probability of any particular subsequent 
development must be the same for individuals taking any of these three sub¬ 
paths. 

Note also that if positions w a and Wb are such that the sets of possible 
future developments from w a and Wb are identical, and the conditional joint 
probability distributions over these sets are identical, then w a and Wb are the 
same position, and must be coalesced for our graph to be a CEG. 

The probability of any event A in the sigma field is hence of the form 

P( A ) = II n e {w'\w) 

AgA \(zA e(w,w')E\ 

where A G A means that A is one of the component atoms of the event A. 
In this paper we will also use the following further notation: 
n^w' | w ) = p{A{fi{w,w')) | A{w)) denotes the probability of utilising the 
subpath p{w, w') (conditional on passing through ui), 

n{w' | w) = p{A{w,w') | A{w)) = Y^i_, 7 r u( w ' I w ) denotes the probability of 
arriving at w' conditional on passing through w. 

Expressing a problem as a CEG allows domain experts to check their beliefs 
in a very straightforward manner: 

We stated in Section 2.1 that the expert in our example believed that males 
& females who display S at any point have the same probability of developing C. 
This is depicted in the colouring of the edges emanating from W 3 & uq in Fig¬ 
ure 4. Our expert can now use the techniques developed in nuns sum to test 
the model represented by Figure 4 against alternative models with different con¬ 
ditional independence / Markov structure. Such a test might yield information 
that grouped the vertices V 3 ,v^,vq & U 7 from Figure 1 into different positions 
than those in Figure 4; or that the vertices ^ 3,^4 & vq are indeed in the same 
stage and position, but that the vertex V 7 is not in this stage (ie. the probability 
of developing C is different for females who display S after puberty), and so the 
edges leaving W 3 & uq in Figure 4 would no longer have the same colouring. 
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2.6 Conditioning on events 

Most conditional independence queries that could realistically be of interest to 
an analyst can be answered purely by inspecting the topology of a CEG. And 
most of these queries involve conditioning on what is known as an intrinsic 
event. 

Definition 4. Intrinsic events. LetC\ be the subgraph of C consisting of only 
those positions and edges that lie on a route A £ A, and the sink-vertex Woo- 
A is intrinsic to C if the number of wo —> ®oo paths in C\ equals the number 
of atoms in the set {A}aga- 

The idea of intrinsic events is closely related to that of faithfulness in BNs [22 
[56] . Note that each atom (& therefore edge) in Ca must by construction have 
a non-zero probability, but edge-probabilities in C\ may differ from those in C 
since some vertices in Ca will have fewer emanating edges than they have in C, 
and the probabilities on the emanating edges of any vertex must sum to one. 

All atoms of the sigma field of C are intrinsic, as are A(w), A (w, vu'), A{e(w, w')), 
A(fi(w,w’)) (provided these are non-empty), and as is the exhaustive set A(wo)- 
If we include the empty set in the set of intrinsic events then we note that in¬ 
trinsic sets are closed under intersection and so technically form a 7r-system (see 
for example [183) we can associate with the CEG C. 

Not all events in the sigma field are necessarily intrinsic, because the class of 
intrinsic events is not closed under union. For example, for the CEG in Figure 4, 
the event A consisting of the union of the two atoms described by the routes 
male , display S before puberty, develop C, die before 50 and male, display S after 
puberty, develop C, die after 50 produces a subgraph Ca which has four distinct 
routes, so A is not intrinsic. However, our interest in intrinsic events is that we 
can condition on them, and we show below that conditioning on intrinsic events 
often destroys the stage-structure of C. Conditioning on non-intrinsic events 
usually destroys position-structure. From this we argue that if we know that we 
wish to condition on an event such as the one described above, we would simply 
sacrifice the position-structure of our CEG (knowing that it would probably be 
lost in the conditioning anyway) and split (uncoalesce) the position w$ to form 
a graph for which this event is intrinsic. 

Even without such sleight of hand, the class of intrinsic events is rich enough 
to encompass virtually all of the conditioning events in the conditional indepen¬ 
dence statements we would like to query. In particular, if our model can be 
expressed as a BN (with vertex-variables {Xj}) then any set of observations ex¬ 
pressible in the form {Xj £ Aj} ({ Aj } subsets of the sample spaces of {Xj}) is a 
proper subset of the set of intrinsic events defined on the CEG of our model [41]. 

Example 1 continued. Suppose for illustrative convenience that the edges la¬ 
belled male, female, displayed S before puberty, displayed S after puberty, never 
displayed S, developed C, did not develop C in our CEG have the probabilities 

2 > a> 4 > 2 ’ 2 > Now let us condition on the event A which is the union of 
all routes except {female, displayed S after puberty, did not develop C, died 
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before 50} and {female, displayed S after puberty, did not develop C, died after 
50}. This event A is clearly intrinsic to C, and has the probability p( A) = y|. 

When we condition on A, the routes A which are components of A get new 
probabilities p( A | A) = p(A, A)/p(A) = p{X)/p(A). In this case each route has 
its probability multiplied by }|. We leave it as a (simple) exercise to show that 
all edge-probabilities remain unchanged except: 


n e {wi 

| wo) becomes 8/15 

n e (w 2 

| w 0 ) becomes 7/15 

7Te(w 3 

| w 2 ) becomes 2/7 

7T e (w 4 

| w 2 ) becomes 1/7 

w e (w 5 

| w 2 ) becomes 4/7 

7T e (w8 

| W 4 ) becomes 1 

the edge 7 r e (wg 

| UI 4 ) does not exist in Ca 

So the positions w± and w 2 are no 

longer in the same stage. 


So, as already noted, conditioning on an intrinsic event can destroy stage- 
structure. This leads us to define an uncoloured version of the CEG. 

Definition 5. Simple CEG A simple CEG (sCEG) is a CEG where there are 
no constraints on edge-probabilities, except that (i) all edge-probabilities must 
be greater than zero (a consequence of the requirement we made for trees), and 
(ii) the sum of emanating-edge-probabilities for any position must equal one. 

What this means in practice is that stage-structure is suppressed: there are 
no stages which are not positions, and so colouring is redundant. There is an 
analogy here with BNs to which one can always add edges, and sacrifice a little 
conditional independence structure. 

We show now that the class of sCEG models is closed under conditioning on 
an intrinsic event: 


Theorem 1. For an event A, intrinsic to C, the subgraph Ca is an sCEG. If the 
probability of any route A in the sigma field of Ca is given by pa(A) = p{ A | A), 
then the edge-probabilities in Ca are given by: 


n e (w' | w) 


P (A | A(e(w,w'))) 
P {A | A(w)) 


TT e (w' | w ) 


The proof of this theorem is in the appendix. We note that this result has 
been successfully used to develop fast propagation algorithms for CEGs HD - 
Note that the probability of an atom A in C conditioned on the intrinsic 
event A is the probability of that atom in the sCEG Ca (denoted pa(A)). It is 
then trivially the case that the probability of an event in C conditioned on the 
event A is the probability of that event in the sCEG Ca- 
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2.7 Random variables on sCEGs 


Random variables measurable with respect to the sigma field of C partition the 
set of atoms into events. So consider a random variable X with state space X, 
and let us denote the event that X takes the value x (gX) by A x . Then the set 
{A x } xeX partitions A (C). 

For any CEG there is a set of fairly transparent random variables which 
includes as a subset the set of measurement variables of any BN-representation 
of the model, if such a representation exists. These are called cut-variables and 
are discussed in detail in Section 3.2. In Figure 4 for example, we have a variable 
which could be called symptom , which could take the values 1, 2 & 3 (in some 
order) for routes traversing edges labelled before , after and never. These are not 
however the only variables we can define on a CEG, and we first consider some 
results for general variables. 

Note that when we write X II Y we mean that p(X = x,Y = y) = 
p(X = x) p(Y = y) \/x £ X, y £ Y, and that this is true for all distributions 
P compatible with C. Now for an intrinsic event A, we can write X II Y | A if 
and only if p{X = x, Y = y | A) = p(X = x | A) p(Y = y | A) for all values x 
of X and y of Y (see for example |T2]). That is X II Y | A <t=> p( A x , A y | A) = 
p(A x | A) p(Ay | A) for all A x G A y G {Ay}y^Y- 

Lemma 1. For a CEG C, variables X,Y measurable with respect to the sigma 
field of C, and intrinsic conditioning event A, the statement X II Y | A is true 
if and only if X II V is true in the sCEG C\. 

The proof of this lemma is in the appendix. This is a particularly useful 
property because it allows us to check any context-specific conditional inde¬ 
pendence property by checking a non-conditional independence property on a 
sub-sCEG. 

To motivate the theory in the remainder of section 2 and in section 3, we 
need a bigger example. 

Example 2. Our researcher from Example 1 now turns her attention to an 
ongoing study. Subjects who display the symptom S (at any point) may be given 
a drug, and the probability of receiving this drug is not dependent on their gender 
or when they displayed S. Those that develop the condition C may be given 
treatment, and the probability of receiving this treatment is not dependent on 
their gender, when they displayed S, or whether or not they received the earlier 
drug. The CEG for this is given in Figure 5. 

These two properties are depicted in the CEG by the positions W 3 & being 
in the same stage, and by wio,w ±2 & W 14 also being in the same stage. 

Figure 5 also tells us that treatment and life expectancy are independent of 
gender and when S displayed, given both the event drug given & develop C and 
the event drug given & not develop C (the positions w 10 & w\\); and that life 
expectancy is independent of gender and when S displayed given the event drug 
not given, develop C & treatment given (the position w\%). 
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Our researcher is interested in the relationships between condition & gender, 
and between condition & when S displayed, for the subgroups (i) who were given 
the drug, and (ii) who displayed S but were not given the drug. 

In BN-theory, if we wish to answer the query X II Y \ Z ?, one way we 
might start doing this is by drawing the ancestral graph of {X. Y. Z\ (see for 
example m)- We do this because variables in the BN which are not part of 
this graph have no influence on the outcome of our query. 

There is no direct analogy for this graph in CEG-theory, but we can consider 
a pseudo-ancestral graph associated with a set of events or variables. So in 
Example 2, all edges associated with treatment or life expectancy lie downstream 
(ie. towards the sink-node) of the edges associated with gender, symptom, drug 
and condition, so we can simply curtail our CEG so that it does not include 
these edges. 

So in Figure 5, the positions W 5 , Wio, Wu, W 13 , W 14 & W 15 are coalesced 
into a new sink-node w 0 a as in Figure 6 . But w-j & wg in Figure 5 were in the 
same stage. As these nodes are now only one edge upstream of w^, they get 
coalesced into a single new position (wj in Figure 6 ). Notice how much simpler 
the pseudo-ancestral graph is than the original CEG. 

We have noted above that stage-structure is often destroyed by conditioning 
on an intrinsic event, but that the set of sCEGs is closed under this conditioning. 
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Figure 6: Pseudo-ancestral graph for Example 2 

So the remainder of our analysis is conducted on an uncoloured CEG. 

The graph in Figure 7 is the uncoloured pseudo-ancestral sCEG C associated 
with the queries that our researcher is interested in. This graph is analogous to 
the moralised ancestral graph used in BN-based analysis. 

There are two natural variables which partition A(C) - these are gender 
(which partitions A(C) into events which we will call M (male) & F (female)), and 
symptom (which partitions A(C) into events which we will call B (S displayed 
before puberty), A (S displayed after puberty) & N (S never displayed)). 

The variable associated with giving the drug partitions A(C) into three events 
- drug given, S displayed but drug not given, and S not displayed and hence 
drug not given. As the third of these events is exactly the event N above, we 
will for brevity describe the second event (particularly when labelling edges) 
simply as drug not given or no drug. 

The variable associated with condition C partitions A(C) into three events - 
C developed, S displayed but C not developed, and S not displayed and hence 
C not developed. Again, as the third of these events is exactly the event N, we 
will for brevity describe the second event simply as C not developed. 

There is no ambiguity here as the queries our researcher is interested in 
correspond to conditioning on the events drug given, and S displayed but drug 
not given. 

Her first question concerns the relationship between condition and when S 
displayed for the subgroup who were given the drug. This requires conditioning 
on the event drug given, so we draw the sub-sCEG Ca for this event. This is 
given in Figure 8. 

The relationships she is interested in concern the probabilities pa(C developed | B) 
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Figure 7: Pseudo-ancestral sCEG C for Example 2 

and pa(C developed | A), and these are given below: 

Pa(C developed | B) 

= {pa(M) pa(B | M) x 1 x pa(C developed | (M,B) or (M, A) or (F,B)) 

+ Pa(F) pa(B | F) x 1 x pa(C developed | (M,B) or (M, A) or (F, B))} 

-F {pa(M) p A (B I M) +p A (F) pa(B | F)} 

= pa(C developed | (M, B) or (M, A) or (F, B)) 

= p(C developed | ((M, B) or (M, A) or (F,B)), drug given) (2.1) 

Note: 

1. We do not need for our purposes here to evaluate the pa(. ■ ■) probabilities, 
but if we wished to we could use the expression from Theorem 1. 

2. The expression (2.1) is still the simplest expression even if we were to 
reintroduce stage-structure and let w± & W 2 be in the same stage. 

Pa(G developed | A) 

= {pa(M) pa(A | M) x 1 x pa(C developed | (M,B) or (M, A) or (F, B)) 

+ Pa(F) pa(A ] F) x 1 x^a(C developed | F,A)} 

{pa(M) p A (A | M) + pa(F) pa (A | F)} 

which clearly does not equal expression (2.1). 

3. The denominator is of course pa(A), but even if we let w\ & u >2 be in the 
same stage, the above expression only simplifies to 
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Figure 8: Ca for A = {drug given} 


Pa(M)pa(C developed | (M, B) or (M, A) or (F, B))+pa(F) pa(C developed | F, A), 
which still does not equal expression (2.1). 

Suppose we now consider the subgroup who displayed S but were not given 
the drug and the sub-sCEG Ca for the event drug not given. This CEG is given 
in Figure 9. 

The corresponding probabilities are: 

p a(C developed | B) 

= {pa(M) pa (B | M) x 1 x pa(C developed) 

+ Pa(F) pa(B | F) x 1 x p A (C developed)} 

-5- Ipa(M) pa(B I M) +pa(F) pa(B | F)} 

= pa(C developed) = p(C developed | drug not given) 

Pa(C developed | A) 

= {pa(M) pa (A | M) x 1 x pa(C developed) 

+ Pa(F) pa (A | F) x 1 x pa(C developed)} 

(pa(M) pa (A I M) +pa(F) p A (A | F)} 

= pa(C developed) = p(C developed | drug not given) 

So we have that whether C developed is independent of when S displayed, given 
that S was displayed but the drug was not given. 

4. We do not need to consider the case where S never displayed, as this has 
no intersection with A: Given that S displayed but drug not given , I know 
that S was displayed, but further knowledge of when it was displayed is 
irrelevant for prediction of whether or not the subject developed C, 
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Figure 9: Ca for A = {drug not given} 


5. This context-specific conditional independence property holds whether or 
not we reintroduce stage-structure and let w\ & W 2 be in the same stage. 

Let us now consider our researcher’s other queries to do with the relationship 
between condition and gender for our subgroups. From Figure 8 we can see that 
if A = {drug given} then: 

Pa(C developed | M) 

= Pa(A or B I M) x 1 x pa(C developed | (M,B) or (M, A) or (F, B)) (2.2) 

and pa (A or B | M) = 1, since A & B are the only edges leaving w\ in Ca- 
Pa(C developed | F) 

= pa(B I F) x 1 x pa(C developed | (M, B) or (M, A) or (F, B)) 
+ Pa(A I F) x 1 x pa(C developed | F, A) 

which clearly does not equal expression (2.2), and this is true even if we rein¬ 
troduce stage-structure and let W\ & W 2 be in the same stage. 

From Figure 9 we can see that if A = {drug not given} then: 

Pa(C developed | M) 

= pa(A or B I M) x 1 x pa(C developed) 

= pa(C developed) = p( C developed | drug not given) 
Pa(C developed | F) 

= pa(B I F) x 1 x pa(C developed) 

+ Pa(A I F) x 1 x pa(C developed) 

= pa(G developed) = p (C developed | drug not given) 
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So we have that whether C developed is independent of gender, given that S was 
displayed, but the drug was not given, and this is true irrespective of whether 
we reintroduce stage-structure. 

We notice that the topological feature which distinguishes Figure 9 from 
Figure 8 is that in Figure 9 there is a cut-vertex (a single vertex not wq or w^, 
through which all routes in the graph pass) lying between the edges associated 
with gender & symptom (upstream) and those associated with condition (down¬ 
stream). We return to cut-vertices and to their role in independence queries in 
section 3. 

Note also that the example above gives ample justification for working with 
sCEGs when considering conditional independence queries, rather than their 
coloured counterparts. 


3 A separation theorem for simple CEGs 

In section 2.7 we introduced random variables on sCEGs. In section 3.1 we de¬ 
velop this idea, before providing a separation theorem for sCEGs in section 3.2. 


3.1 Position variables 

As noted in Section 1, modified BNs of one type or another are widely used be¬ 
cause real problems tend to contain more symmetries than can be represented 
by a standard BN. What is generally not addressed in papers on these types 
of graphs is the consequence that this extra structure has for the Markov re¬ 
lationships between the problem variables. With CEGs we can address this 
explicitly & automatically, and the first step towards doing this is to consider 
model variables which are more fundamental than the measurement variables 
customarily considered when working with BNs. So in this section we describe 
two types of elementary random variables, measurable with respect to the sigma 
field of C, that can be identified with each position w £ U(C)\{woo}. These are 
the variables {/(w)}} and {X(u;)} defined below. 

Note that when we say that a variable X takes the value x, this is equivalent 
to saying that an individual from our population has a development which we 
equate with a route A, and that this route A is an element of A x , the event 
corresponding to X = x. 

For a position w, I(w) can take the values 1 or 0 depending on whether this 
individual is on a route A which does or does not pass through w. So: 


I(w) 


1 if w £ A 
0 if w ^ A 


(where as above, w £ A means that the position w lies on the route A). 

Up until now we have labelled edges by their start and endpoints (eg. 
e(w,w')), but we can also label the edges leaving a position w by a set of 
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arbitrary labels of the form e x (w) (x = 1,2,...). We define X(w) by: 


X{w) 


x if e x (w) G A 
0 if w A 


So X(w) = x 0) if our individual is on a route A which passes through w 
and the edge e x (w). 

Recall that a CEG depicts all possible histories of a unit in a population, 
and gives a probability distribution over these histories. However, when a single 
unit traverses one of the routes in the CEG, values are assigned to I(w) & X(w) 
for all positions w € V(C). 

Notice that since I(w) is clearly a function of X(w), to specify a full joint 
distribution over the position variables, it is sufficient to specify the joint dis¬ 
tribution of {X(w) : w € E(C)\{u; 00 }}. Note also that all atoms A can be 
expressed as an intersection: 


A = P) i X ( w ) = ^’a} , 

wG A 


and events in the sigma algebra of C as the union of these atoms: 

a= u j n =x ^ 

aga ItueA 

where x\ 0) is the unique value of X(w) labelling the edge in the route A. 

Up until this point we have used the words upstream and downstream rather 
loosely - in the context of sets of edges we have simply used these words to 
mean further towards wq and further towards Wqo ; but we need to formalise the 
meanings here in the context of positions. So when we say that wi is upstream 
of W 2 , or u >2 is downstream of w±, we mean that w\ -< 1 x 2 ■ 

For any set A C V(C), let Xa denote the set of random variables {A(u;) : 
w G A} and I a the set {I(w) : w G A}. Also, for any w G V(C), let U(w) be 
the set of positions in V{C) which lie upstream of the position w, D(w) the set 
of positions which lie downstream of w, U c (w) the set of positions which do not 
lie upstream of w, and D c (w) the set of positions which do not lie downstream 
of w. 

Lemma 2. For any sCEG C and position w G U(C)\{u;oo}, the variables 
I(w),X(w) exhibit the position independence property that 

X(w) H X D c (w) | I(w) 

This result (an extension of the Limited Memory Lemma of 37]) is anal¬ 
ogous to the Directed Markov property which can be used to define BNs (see 
for example £ZB]), and which states that a BN vertex-variable is independent of 
its non-descendants given its parents. It provides a set of conditional indepen¬ 
dence statements that can simply be read from the graph, one for each position 
in V(C). The proof of the lemma is in the appendix. 
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The statement that X(w) II Xjjc^ w s | ( I(w ) = 1) can be read as: Given 
a unit reaches a position w £ I/(C), whatever happens immediately after w 
is independent of not only all developments through which that position was 
reached, but also of all positions that logically have not happened or could not 
now happen because the unit has passed through w. 

3.2 Theorem and corollaries 

It is doubtful whether BNs would have enjoyed their enormous popularity if it 
were not so apparently easy to read conditional independence properties from 
them. In particular, the existence of the d-separation theorem [221 S3] has 
allowed all practitioners to make some attempt at model interpretation with 
some degree of confidence. 

The presence of any context-specific conditional independence structure how¬ 
ever severely hampers analysts using BNs in their attempts to get accurate 
pictures of the structure of their problems [7] [29]. In earlier sections of this 
paper (and in particular in section 2.7) we have been developing the theory 
needed for reading and representing (context-specific) conditional independence 
structure using CEGs. In particular, Lemma 1 allows us to consider context- 
specific queries by looking at the relevant sub-CEG; and Example 2 provides 
the rationale for looking at sCEGs. We now provide a separation theorem for 
sCEGs. 

Using the standard terminology of non-probabilistic graph theory, we call 
a position w £ U(C)\{ui 00 } a cut-vertex if the removal of w and its associated 
edges from C would result in a graph with two disconnected components. An 
alternative description would be a position other than wo through which all 
routes pass. We also remind readers at this point that when we write (for 
example) X II Y we mean that p{X = x, Y = y) = p(X = x) p(Y = y) 
Vx £ X, y £ Y, and that this is true for all distributions P compatible with C. 

Theorem 2 . In an sCEG C with W\,W 2 £ U(C')\{w 00 } and W 2 7 ^ w\, 
X{vj\) II X(w 2 ) if and only if either (i) there exists a cut-vertex w such that 
wi -< w -< W 2 , or (ii) W 2 is itself a cut-vertex. 

The proof of this theorem is in the appendix. The variables {/(w)}} and 
{A'(w)} have an obvious intrinsic mathematical interest, but for more practical 
purposes we need to be able to make statements about the relationships between 
variables which are more closely analogous to the measurement variables used 
in BN-based analysis. So, in the same way that our primitive probabilities were 
used to build probabilities of subpaths and routes, we can use the X(w) variables 
to build new bigger variables which have a more transparent interpretation for 
the analyst. 

In Figure 4, let X{wf) (for i = 5,6, 7,8,9) equal 1 if an individual has a 
development which takes them through the position Wi and then they die before 
the age of 50, and equal 2 if they have a development which takes them through 
the position Wi and then they die after the age of 50. Since an individual’s 
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development will take them through one & only one of { 1 U 5 , wq, w?, ws,wq}, we 
can define a life expectancy indicator across the whole CEG by 

sup X(wi) 

Wi-.i£{ 5 , 6 , 7 , 8 , 9 } 

which takes the value 1 if an individual dies before the age of 50, or 2 if they 
die after the age of 50. 

Analogously with the idea of a cut-vertex, a position cut is a set of positions 
the removal of which from V(C) would result in a graph with two disconnected 
components. This is formalised in Definition 6 . 

Definition 6. Position cut. A set of positions W C P(C)\{wo, w ; oo} is a 
position cut if {A(w) : w £ W} forms a partition of A(C). 

As noted above, for any position cut W, we can define a cut-variable ; this is 
formalised in Definition 7. 

Definition 7. Cut-variable. For a position cut W, the random variable 
X(W) = sup^gjy X(w) is called a cut-variable. 

Note that X(W) can also be defined as X(W) = The equivalence 

of the two forms comes from the fact that X (w) > 0 for one & only one position 
w £ W. 

In Figure 4 we have the obvious cut-variables gender and symptom. If we 
assign values of 1 to edges labelled develop C, 2 to not develop C, 3 to die before 
50, and 4 to die after 50, then X(W) for W = { 11 J 3 , uq, W 5 } becomes a more 
sophisticated cut-variable for developing the condition: X(W) takes the value 1 
if & only if an individual develops C, but X(W) = 2 tells us that an individual 
displayed symptom S yet did not develop C, and X(W) = 3 or 4 tells us that 
an individual did not display S and therefore did not develop C. 

Theorem 2 allows us to look at the detail of the Markov structure depicted 
by our CEGs. The following corollaries allow us to get a broader picture. 

Corollary 1 . For an sCEG C with position cuts W a and Wb, the property 
X(wi)UX(w 2 ) holding for any w 1 £ W a , W 2 £ Wb implies that X(W a )UX(Wb) ■ 

So, as one might expect, the presence of a cut-vertex in an sCEG renders 
cut-variables upstream of this vertex independent of cut-variables downstream 
of the vertex. The proof of the corollary is in the appendix. 

As already noted, CEGs have been designed for the representation and anal¬ 
ysis of asymmetric problems; and for symmetric problems a graph such as a BN 
is more appropriate. But it is clear that where a problem can also be adequately 
represented as a BN (without too much context-specific structure), the set of 
cut-variables of a CEG-representation must contain the set of variables associ¬ 
ated with the vertices of the BN, as these are simply the measurement variables 
of the problem. Hence, if an sCEG C represents a model which admits a prod¬ 
uct space structure, M, N are measurement variables of the model associated 
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with position cuts Wm,Wn, then the property M II N holds providing that 
X(w m ) III(ro„) for any w m € Wm,w„ £ Wn- This result follows immediately 
from Corollary 1. 

Of more interest to analysts of asymmetric problems is the result given in 
Corollary 2, which ties together the ideas presented in Corollary 1 and Lemma 1. 

Corollary 2. Let C be a CEG with position cuts W„. 114 , and A an event 
intrinsic to C. If, in the sCEG Ca, there exists a cut-vertex w such that 
W a ^w^ W b , then X(W a ) II X(W b ) | A. 

The proof of this corollary is in the appendix. We can immediately deduce 
that if a CEG C represents a model which admits a product space structure, 
M, N are measurement variables of the model associated with position cuts 
Wm, Wn, and A is an event intrinsic to C, then if in the sCEG Ca there exists 
a cut-vertex w such that Wm -< w -< Wn, the property M U N | A must hold. 

Recall from Section 2.7 that for a measurement variable X with state space X, 
the event that X takes the value x (G X) is denoted by A x , and the set {A x } xe x 
partitions A(C). So the query M II N \ X ? can be answered by checking the 
queries M II N \ A x ? for each i£X. If our problem elicitation indicates that 
there are no context-specific variations in independence properties connected 
with conditioning on the variable X , we can answer the query MUN \ X 7 by 
looking at a single graph Ca x for some convenient value X = x. 

Moreover, although this argument has been constructed under the assump¬ 
tions that C admits a product space structure, and that M , N & X are mea¬ 
surement variables of the problem, these assumptions are not strictly necessary; 
it is sufficient that M & N are cut-variables, and that {A x } x6 x partitions A(C). 
And even these conditions can be relaxed, as we see in Example 3. 

Example 3. An alternative drug becomes available, resulting in a revised sCEG 
as in Figure 10. 

Let W a = {w 0 },W b = {wi,w 2 },W c = {w 3 ,W 4 } and W d = {w 5 ,w 6 ,wr,w 8 }. 
Now, unlike W b , the sets W c & W d are not position-cuts as they do not parti¬ 
tion A(C). However, we can still define 

X(W C ) = sup X(w), 

w GW C 

X(W d ) = sup X{w ) 

w€Wd 

X(W c ),X(Wd) (although not cut-variables) are both measurable with respect 
to the sigma held of C, but can, unlike X{W a ) or X{W b ), take zero values, if a 
patient does not display the symptom. 

If we let Ai be the event S displayed but drug not given , then we get the sub- 
SCEG Ca-l shown in Figure 9, from which we can read the statement 

{X{w a ),x{w b ))nx{w d ) I Ap 
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Figure 10: sCEG for Example 3 


If we let A 2 be the event old drug given, then we get the graph Ca 2 shown in 
Figure 8, and as we have already shown, 

(X(W a ),X(W b ))tyX(W d ) I a 2 . 

If we let A3 be the event new drug given, then we get a graph C\ 3 which differs 
from that in Figure 9 only in that the cut-vertex is now ws, not W7. We can 
then read the statement 

{X{W a ),X(W b ))UX(W d ) | A 3 . 

Note that {Ai} i= i ; 2 j3 here does not partition A(C). 

Clearly we can call X(W a ) & A(W4) gender (Ac) & symptom ( Xs ). If we 
let A(uj 3 ) & X(wi) take the values 1, 2 & 3 for the outcomes no drug, old drug & 
new drug, then X(W C ) takes the values 0, 1, 2 & 3 for did not display S so did not 
receive drug, displayed S but did not receive drug, received old drug and received 
new drug. So there is also no ambiguity in calling X(W C ) drug (Ac). Taking a 
similar approach to {Al(u'i)}i g {5 j 6,7 j 8} we find that there is also no ambiguity in 
calling X(Wd) condition (Xc), and (since X(W C ) = 0 => X(Wd) — 0) collecting 
these statements together gives the property 

II {X G ,X S ) | (I d /2), 

ie. condition is independent of gender & symptom given that did not receive the 
old drug. 
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4 Discussion 


Chain Event Graphs were introduced for the representation & analysis of prob¬ 
lems for which the use of Bayesian Networks is not ideal. The class of models 
expressible as a CEG includes as a proper subset the class of models expressible 
as faithful regular or context-specific BNs on finite variables. Unlike the BN, the 
CEG embodies the structure of the model state space and any context-specific 
information in its topology. In this paper we have justified the use of sCEGs 
for investigating context-specific conditional independence queries of the form 
X II Y | A ?, and provided a separation theorem for sCEGs and position vari¬ 
ables. The introduction of cut-variables (analogous to BN measurement vari¬ 
ables, but more flexible) provides a repertoire of techniques which will enable 
researchers to tackle a comprehensive collection of conditional independence en¬ 
quiries on models of asymmetric problems for which the available quantitative 
dependence information cannot all be embodied in the DAG of a BN. 

The research that led to this paper also yieded a number of other questions, 
some of which are discussed here. The most obvious of these is Does the only 
if part of Theorem 2 hold if we allow constraints on a CEG’s edge-probabilities 
such as two edge-probabilities being equal? The short answer is No, but the 
problem is somewhat more subtle than this answer suggests. Some preliminary 
work on this is described in |40| . but a more comprehensive analysis awaits a 
future paper. 

For illustrative convenience the CEGs in the examples in this paper have 
been constructed in temporal order, but this is not the only valid ordering of 
a CEG. In [42] for instance, we had a CEG representing a police investigation 
where the order of events was that in which the police took action or discovered 
evidence (Extensive Form order [35]). At the simplest level, there are valid 
reorderings of a CEG in which the cut-variables appear in a different sequence, 
and there is a set of rules governing when adjacent cut-variables can be swapped 
to produce a different valid ordering. For CEGs depicting models which have a 
natural product space structure with no context-specific anomolies, these rules 
are relatively straightforward, but for more general CEGs where we might need 
to consider swaps of sets of adjacent edges rather than of cut-variables, the 
rules become very complex. However, it seems fairly certain that if two cut- 
variables in a coloured CEG are independent then there is a valid reordered 
pseudo-ancestral CEG of these variables in which the variables are separated by 
a cut-vertex. We hope to yield more light on this in a future paper. 

In [5] we have also looked at infinite CEGs where an individual might come 
back to (essentially) the same state at some future time point. These problems 
can be expressed as a CEG analogous to the 2-time-sliced Dynamic BN HU, or 
as a graph which is no longer acyclic. Both representations involve modification 
to the rules governing conditional independence structure. This is discussed 
in [5], but there is an opportunity here for developing CEG semantics further. 
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Appendix: Proofs 

Proof of Theorem 1: 

Consider the underlying tree T of the CEG C(T). The event A corresponds to a 
union of routes of T. Let 7a be the reduced tree consisting only of the vertices, 
edges & routes that comprise A. 

If we denote the probabilities of events in 7a by p A (..), then clearly we require 
that p A {\) = p {A | A). 

Once route probabilities in a tree are given, edge-probabilities are uniquely 
defined. So letting the edge-probabilities in 7a be denoted by n e (v' \ v), and 
letting the route A € A be described by its edges as: 

A = A(e(i> 0 , v\ )) fl A(e(iq, v 2 )) 0 • • • D A(e{v p , v q )), 


we have: 


PA (A) = p{ A | A) 

= p(A(e(v 0 ,v 1 )),A(e(vi,v 2 )),...A(e(v p ,v q )) | A) 

= p(A(e(u p , v q )) | A(e(vo, ui)), A(e(ui, v 2 )), ■ ■ ■, A) 
x ••• x p(A(e(v 1 ,v 2 )) | A(e(u 0 ,ui)), A) 
x p(A(e(u 0 ,ui)) | A) 

= p A {A{e(v p ,v q )) | A(e(u 0 , tq)), A(e(iq, t> 2 )), ■ ■ ■) 
x • • • x p A (A(e(uo,ui))) 

= PA(A(e(v p ,v q )) | A(-yp)) 

X ••• X p A (A(e(vi,v 2 )) | A(iq)) 
x p A (A(e(vo, fi))) 

using the Markov property of trees from section 2.2 
= II PA{A(e(v,v')) | A(v)) 

e(v,v')(z\ 

= II Ke{v' | V) 

e(v,v')(z\ 

If we now let Ca inherit the edge-probabilities from 7a , we have: 

Pa (A) = K e {w’ | w) 

e(w,w')€ X 

= II PA.(A(e(w,w')) | A(w)) 

e(w,w')(z\ 

where, without ambiguity, we let p A (..) denote the probability of an event in Ca- 
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Then 


7 r e (V | w) = p A (A(e(w,w')) | A(w)) 

= p(A(e(w,w')) | A(io),A) 

= p(A(e(w,w')),A(w),A) 

p(A(w),A) 

p(A | A(e(w, w')), A(w)) p(A(e(w,w')),A(w)) 
P (A | A (w)) p(A(w)) 
p(A | A(e(w,w'))) p(A(e(w,w’)),A(w)) 
p( A I A(w)) p{A(w)) 


since A(e(w,w')) C A (w) 


p{A | A (e(w,w'))) 
P{A | A (w)) 


7r e (u/ | W) 


Under this edge-probability assignment, no edges in Ca are given a zero prob¬ 
ability, since each e(w,w') £ A € A. And no position in Ca needs to be split 
(uncoalesced) in order for us to make this edge-probability assignment. 

By construction two vertices in a tree on the same route cannot be in the 
same position. So consider two vertices in Ca which do not lie on the same route. 
Then the collections of routes (elements of A) passing through each of these 
vertices are disjoint. So we can assign the probability distribution over these 
routes (in C) in such a way that the conditional joint probability distributions 
on the subpaths emanating from these two vertices in Ca are different. Hence 
our assignment does not require us to coalesce distinct positions in C\. 

So position-structure is preserved. 

Hence Ca is an sCEG, and the set of sCEGs is closed under conditioning on 
an intrinsic event. 


Proof of Lemma 1: 

Variables X,Y partition the set of atoms of C, and since A C A(C), X, Y also 
partition the set of atoms of Ca- 

Consider arbitrary events A x , A y from {A^j^gx, {A^j^gY, and the event A x nA. y . 
Then p(A x \ A) = p A (A x ) etc., and the statement 

p{A x ,A y | A) = p(A x | A) p(A y | A) 
is true if and only if the statement 

Pa(A x ,A v ) =pa(A x ) pa(A v ) 

is true. If either of these relationships holds for all A x £ {A x } xe x, A y £ {A y }. y gY, 

then so does the other for all A x £ {A x } xe x, A y £ {A y } y gY- 

Hence XU Y \ A if and only if X H Y in Ca ■ □ 
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Proof of Lemma 2: 

1. Consider a single route A consisting of a subpath /zo(wo, w) between wo and w, 

the edge e(w, w') labelled x 0), and a subpath / i\{ww^) connecting w' to Woq. 

Now this route consists of a set of edges and by construction the probability p( A) 
of the route is equal to the product of the probabilities labelling each of these 
edges. Moreover, the probability of any subpath of A is equal to the product of 
the probabilities labelling each of its edges. So p{ A) can be written as the prod¬ 
uct of the probabilities of three subpaths: po(wo,w),e(w,w') and pi(w', Woo). 
Thus: 

p( A) = 7t Mo (w | w 0 ) 7r e (w' | w) 7r Ml (woo | w'). 

But the fact that A utilises the subpath po(wo,w) between Wq and w allows 
us to completely specify the value of the vector X v i w y By a slight abuse of 
notation we can represent this as Xjjt w ^ = 7 iq(wq,w ). 

Consider now the event ( Xijt w \ = po(wo,w), I(w) = 1, X(w) = x), which is 
the union of all w o —> w Q0 routes which utilise the subpath po (wo, w) and the 
edge e(w,w'). Then since this is an intrinsic event we can write: 

p(X U{ w) = Po(wo,w), I{w ) = 1 ,X(w) = x) 

= p(A(po{w 0 , w)), A(w), A(e(w, w'))) 

= 7T Mo (w | Wo) 7T e(w' | w) ^ Hoc I w'), 

where M\ is the set of all subpaths from w' to Woq. But 7 r Ml (w^ \ w') = 1 

since all paths through w' terminate in Woo- 

Similarly, for the event (Xjjr w \ = po(wo,w), I(w) = 1) we have: 

Pi x u(w) = Po(w 0 ,w),I(w) = 1) = p(A(p 0 (w 0 ,w)),A(w)) = 7 t^ 0 (w | w 0 ). 

So 

, \ T( \ n l*( w I W °) n e( W ' I W) 

P{X{w) = x X V ( W ) = po(w 0 ,w),I(w) = 1) =- 7 —;-t- 

7r M0 (w | w 0 ) 

= TT e (w' | w) = p(A(e(w, w')) | A (w)) 

= p(X(w) = x | I(w) = 1). 

Hence 

i(w)m, w |(iH = i) (i) 

2. If /(w) = 1 then X(w') = 0 for all w' G D c (w)C\U c (w), so we can completely 
specify the value of the vector X D cr w -) n jjcr w y and expression (1) implies: 

| (X D c( w ) nU c( w ), I(w) = 1) (2) 

Moreover if J(w) = 1, no further information about XD c (w)nu c (w) will assist us 
in predicting the value of X (w). Hence 

X(w) H X D c( w)nU c ( w ) | (7(w) = 1) (3) 
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Using a result from m, the expressions (2) and (3) yield the result: 

X(w) II X D c^ nU a^ | (I(w) = 1) 

^iWnx DeW | (i(w) = l) 

3. If I(w) = 0 then X(w) = 0, and no further information about Xjjc( w ) will 
assist us in predicting the value of X(w). Hence also 

X(w) H I D = W | (I(w) = 0) □ 

Proof of Theorem 2: 

1. Sufficient conditions for independence. The sufficient conditions for 
independence are an almost immediate consequence of similar results for Markov 
processes, but we include a proof here for completeness. 

Consider an sCEG C, and two positions wi,W 2 € U(U)\{u; 00 } such that 
wi -< w -< W 2 for some cut-vertex w. By construction J(uq) ^ 0, 1(w2) ^ 0, 
I(w) = 1 . 

Consider the event (X(wi) = Xi,X(wj 2 ) = X 2 ) = (X(uq) = x\,I(w) = 1, 
X(u> 2 ) = X 2 ) for xi 0 ,a :2 0. This is the union of all routes passing 

through w 1 , utilising an edge e(wi,w' 1 ) labelled x\, passing though w, passing 
through W 2 , and utilising an edge e(w 2 ,w 2 ) labelled X 2 ■ By analogy with the 
proof of Lemma 2 we can, since this is an intrinsic event, write: 

p(X(wi) = xi,X(w 2 ) = X 2 ) = ^2, VoK I w o) K e (w[ I Wi) ^2 ( w I w \) 

M 0 GM 0 / 11 GM 1 

X J2 ^ 2(^2 I w) 7 Te{w' 2 I w 2 ) J2 ^3 (Woo | w' 2 ) 

/X 2 EM 2 /^3 G A /3 

where Mq is the set of all subpaths from wq to wi , M\ is the set of all subpaths 
from Wi to w, M 2 is the set of all subpaths from w to W 2 , and M 3 is the set of all 
subpaths from w' 2 to Woo- But X^ 0 eM 0 7 r Mo(' u, i I w o) is simply the probability 
of reaching w 1 from wo etc., so this equals 

= 7r(w! I w 0 ) n e (w[ |tui) tt(w \ w[) tt(w 2 I w) Tr e (w 2 | w 2 ) tt{woo I w' 2 ) 

= it{wi | wq) 7 T e (w[ |uq) x 1 x ir(w 2 | w) n e (w 2 \ W 2 ) x 1 

Similarly for the event X('uq) = X\ 1 we can write: 

p(X(wi) = xi) = -k[wi I W 0 ) 7 T e (w[ \ Wi) X 1 


so 


p(X(w 2 ) = X 2 I X(wi) 


Xi) 


_ 7r(wi I Wq) 7 T e (w[ |nq) 7 t(w 2 | w) TT e {w 2 \ W 2 ) 

n(w! | w 0 ) 7 T e (w[ |wji) 

= n(w 2 | w) n e (w 2 | W 2 ) 
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Now consider the event (X(w 2 ) = x 2 ) = (I(w) = l,X(w 2 ) = x 2 ). Analogously 
with above we can write: 


p(X(w 2 ) = x 2 ) = 7 t(w I w 0 ) 7t(u> 2 I w) ir e (w 2 \ w 2 ) ^(w^ \ w' 2 ) 

= 1 x n(w 2 | w) Tr e (w 2 | w 2 ) x 1 
= p(X(w 2 ) = x 2 | X(wi) = xi) 

It is straightforward to show that the result also holds for X\ = 0 and x 2 = 0. 

If w 2 is itself a cut-vertex (with w\ -< w 2 ), then we replace I(w) = 1 by I(w 2 ) = 1 
in the above argument with the same result. 

So a sufficient condition for X(u>i) II X(w 2 ) is that either w 2 is itself a 
cut-vertex, or there exists a cut-vertex w such that w\ -<w<w 2 . 

2. Necessary conditions for independence. 

Let A(wi) II X(w 2 ) (and since I(w) is a function of X(w), A(tui) II I(w 2 ) 
and I(w\) II I(w 2 )). Let the set of routes of C be partitioned into four subsets. 
Call a route Type A if it passes through w 2 , but not through w% , Type B if it 
passes through neither w\ nor w 2 , Type C if it passes through both w\ and w 2 , 
and Type D if it passes through w\, but not through w 2 . Our proof proceeds 
as follows: 

(a) We show that we must have w 1 ^ w 2 (ie. the set of Type C routes is 
non-empty). 

(b) We show that every route intersects with every other route at some point 
downstream of w 0 and upstream of w^- If two Wq —>• w a0 routes share no 
vertices except wo and Woo, we call them internally disjoint. So there cannot 
be two internally disjoint wo —> Woo routes in C 

(c) We show that there must therefore be a cut-vertex between wo and Woo- 

(d) We show that either W\ is a cut-vertex or w 2 is a cut-vertex, or there exists 
a cut-vertex w such that w\ -< w -< w 2 . 

(e) Finally we show that if Wi is a cut-vertex then there must also either be a 
cut-vertex at w 2 or a cut-vertex w such that Wi -< w -< w 2 . 

(a) Suppose that w\ w 2 (and recall that w 2 w 1 ). Then 

p(I(w 2 ) = 1 | /(wi) = 1) = 0. /(wi) II I(w 2 ) => p(I(w 2 ) = 1) = 0 

=> I{w 2 ) = 0. This is impossible by construction. Therefore w 1 -< w 2 . 

(b) We first show that each Type C route intersects with every other route 
at w\ or at w 2 or at some point between these positions. 

Let Ai be a Type C route, and pi(wi,w 2 ) the subpatli coincident with Ai 
between w\ and w 2 . If the set of Type B routes is non-empty then let \ 2 be a 
Type B route which does not intersect with pi (ie. X 2 and pi have no positions 
or edges in common). 

Consider a distribution P which assigns (1) a probability of 1 — e to every edge 
of the subpath pi{wi, w 2 ), and (2) a probability of 1 — 5 to each edge of the 
route \ 2 . Let the number of edges in pi(w\,w 2 ) be n(pi) and the number 
of edges in A 2 be n(A 2 ) (where both n(pi) and n(A 2 ) are finite). Then let 
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(1 — e)"^ 1 ) > 0.9 and (1 — S) n ^ 2 ^ > 0 . 8 . If A2 does not intersect with m then 
this is always possible. 

Under P , assignment ( 1 ) gives us that 

p(I{w 2 ) = 1 | I(w 1) = 1 ) > (1 - > 0.9 

and I(w 1) II I(w 2 ) implies that under this P 

p(I(w 2 ) = 1 | I(w 1) = 0 ) > 0.9 =>■ p(I(w 2 ) = 0 | I(w 1) = 0 ) < 0.1 
But assignment ( 2 ) gives us that 

p(I(w 2 ) = 0) > p(I(wi) = 0 ,I(w 2 ) = 0) > p(A 2 ) = (1 - S) n(X2 ) > 0.8 
=> p(I(w 2 ) = 0 | I(w 1) = 0) > 0.8 

The assumption I(w 1) II I(w 2 ) is incompatible with the assignments of ( 1 ) 
and ( 2 ). But these assignments are always possible if X 2 does not intersect 
with pi. Hence A 2 must intersect with ji\ . 

Hence each Type C route intersects with every Type B route at some point 
downstream of W\ and upstream of w 2 . Also each Type C route intersects with 
every Type A route (at w 2 ), with every Type D route (at w%) and with every 
other Type C route (at both w 1 and w 2 ) ■ 

We now consider routes that are not of Type C. If the set of non-Type C 
routes is non-empty let A3,A4 be members of this set which do not intersect 
except at wq and w^. Let p(wi,w 2 ) be a subpath between w 1 and w 2 . 

From above both A3 and A4 must intersect with /i. Let A3 intersect with /i only 
at the positions W31,... W3 m , where W31 -<•■•-< w3 m ; and let A4 intersect with 
/i only at the positions w^i,... W4 n , where W41 -<■■■-< W4 n . Without loss of 
generality let w 1 A W31 -< W41 A w 2 , so that A3 could be a route of Type B or 
Type D, and A4 could be a route of Type A or Type B. 

Suppose firstly that W4 n -< W3 m . Consider the subpath H3(w\,w 2 ) which coin¬ 
cides with fi from wi to W31 (if W31 ^ w 1), coincides with A3 from W31 to W3 m , 
and coincides with fi from W3 m to w 2 . This subpath /Z5 does not intersect with 
the route A4. This is impossible since every route in C intersects with every 
h(wi,w 2 ) subpath. 

Suppose therefore that W3 m -< w4 n . Consider the subpath H6{u’i,w 00 ) which 
coincides with ji from w\ to W31 (if W31 ^ w\) and coincides with A3 from U'31 
to Woo; and the subpath p^(wo, w 2 ) which coincides with A4 from wq to W4 n and 
coincides with ji from W4 n to w 2 (if W4 n ^ w 2 ). Consider also a distribution P 
which assigns (1) a probability of 1 — e to every edge of hq, and (2) a probability 
of 1 — S to every edge of /Z7. Let the number of edges in Pg(wi,Woo) be n(p g) 
and the number of edges in p,y(wo,w 2 ) be n(p 7) (where both n(p 6) and n(p 7) 
are finite). Then let (1 — e ) n (i* e ') > 0.9 and (1 — 5 ) n ^ 7 ' > > 0 . 8 . If A3 and A4 do 
not intersect then this is always possible. 

Under P, assignment ( 1 ) gives us that 

p(I{w 2 ) = 0 | /(wi) = !)>(! — e) n(M6) > 0.9 
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and I(w 1) II I{w2) implies that under this P 

p{I{w 2) = 0 | I(wi) = 0 ) > 0.9 p(I(w 2) = 1 | I(wi) = 0 ) < 0.1 

But assignment ( 2 ) gives us that 

p(I(w 2) = 1 | I(w 1) = 0) > 0.8 * 

The assumption I(w 1) II I(w 2) is incompatible with the assignments of ( 1 ) 
and ( 2 ). But these assignments are always possible if A3 and A4 do not in¬ 
tersect. Hence A3 and A4 must intersect. 

Hence each Type B route intersects with every Type A, Type B or Type D route, 
and each Type A route intersects with every Type D route. Also, each Type A 
route intersects with every other Type A route (at W2), and each Type D route 
intersects with every other Type D route (at w 1). So each route in C intersects 
with every other route downstream of wq and upstream of w a0 . 

Hence there cannot be two internally disjoint directed routes from wq to Woo- 

(c) To show that this implies the existence of a cut-vertex between wq 

and re00, we briefly consider a CEG as a Flow Network where every edge and 
every vertex (except w 0 and Woo) has a (flow) capacity of one. Then the maxi¬ 
mum flow through the CEG from wq to Woo must equal the maximum number 
of internally disjoint Wq —»• routes. We can now use Ford & Fulkersons’ Max 

Flow Min Cut Theorem m- This theorem applies to networks where only the 
edges are given capacities, so we replace each vertex w € V(C)\{wq, ®oo} by a 
pair of vertices w~ ,w + connected by an edge e(w~ ,w + ) with a capacity of one 
- the only edge emanating from w~ being e(w~,w + ) and the only edge entering 
w + being e(w~,w + ). 

The theorem states that for a Flow Network with a single source and a single 
sink, the maximum flow from source to sink equals the capacity of the minimum 
cut, where cuts pass through the edges of the graph (ie. a cut partitions V(C) 
into two collections of vertices with wq in one collection and Woo in the other), 
and the capacity of the minimum cut is the sum of the capacities of the edges 
which are cut. 

So if in our CEG we have no pairwise internally disjoint w 0 —>• Woo routes, 
then the maximum flow through the CEG from Wq to Woo must equal one, and 
the capacity of the minimum cut of the CEG must also equal one. Hence all 
wq —» Woo routes must pass through a single edge. 

Now this edge may be of the form e(w ~, w + ), in which case w is a cut-vertex; 
or the edge may be of the form e(w a ,Wb) for w a 7^ Wb, in which case both w a 
and Wb are cut-vertices. Hence there is a cut-vertex w such that Woo- 

This result can also be arrived at by using a corollary of Whitney’s [ 31 ] Theo¬ 
rem 7 (a result for undirected graphs, sometimes described as the 2 nd variation 
of Menger’s Theorem [ 23 h. 

(d) Suppose there exists a cut-vertex upstream of w±. Then relabel this 
cut-vertex as wq and repeat the argument of (b)(c) to show that there exists a 
cut-vertex between this new wq and Woo- Since the number of positions in C is 
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finite, repeated use of this argument shows us that either w\ is a cut-vertex or 
there exists a cut-vertex downstream of w \. A complementary argument shows 
that there exists a cut-vertex at W2 or upstream of 1/J-2 • 

(e) Suppose W\ is a cut-vertex, but W2 is not. Then either (i) w? lies exactly 
one edge downstream of W\ on every W\ —> w 2 subpath, or (ii) there exists a 
position w{ (7^ W2) exactly one edge downstream of w\ lying on a w 1 —> W2 
subpath. 

(i) We know that X(u>i) 7^ 0 (since w\ is a cut-vertex), so if X(wi) takes a 

value corresponding to an edge from w\ to w 2, then I(ui 2) = 1 and X{w2) > 0; 
otherwise I(w 2) = X(w2) = 0 . So X(w2)]^ -A(u>i). * 

(ii) If X(wi) takes a value corresponding to an edge from w\ to w{, then 

I{w\) = 1 ; otherwise /(«;}) = 0 . Hence I(w{) is a function of X(w\). So 
X(wi) MX(w2) =>■ X(wi) HI(w 2 ) =>■ UI(w2), and using the argument of 

(b), (c), (d) above there must be a cut-vertex at w\ or between re} and u>2- 

Therefore there exists a cut-vertex at W2 or a cut-vertex w such that 
W 1 -< W -< W2- 

□ 


Proof of Corollary 1: 

Let X(wi) HX(w 2 ) hold for some Wi € W a , w 2 £ Wt, . Then by Theorem 2 
either (i) W2 is a cut-vertex (in which case Wb consists of the one position W2), 
or (ii) there exists a cut-vertex w such that wi -<w^W2- 

Since W a and Wb are position cuts, this implies that either (i) w a -< W2 
Vw a G W a , or (ii) w a -< w ~< Wb Vw a G W a ,Wb G Wb, and hence 
(i) X(w a ) HI(«i 2 ) Vw a G W a , or (ii) X(w a ) UX(w b ) Vw a G W a ,w b G W b . 

Note that X(w a ), X(w b ) pairwise independent for all w a ,w b does not in 
general imply groupwise independence, but it does here: 

Any event characterised by the expression X\y a = x a has the form: 

X{w' a ) = x a (7^ 0 ) for some w' a G W a , X(w a ) = 0 Vw a G W / 0 \{'u;( l } 


So 

P(Xw a = x a, X Wb = X b ) 

= p{X(w' a ) = Xa,X(w' b ) = X b ,X(w) =0 Ww G W a U W b \{w' a , Wft}) 
for some w' a G W a , w' b G W b 

= P(X{w ' a ) = X a ,X{w b ) = X b ) 
since X{w' a ) 7 ^ 0 =>■ X(w a ) = 0 Vw 0 G W a \{w' a } etc 
= P( X (™'a) = Xa) P(X(w' b ) = Xb) 
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since X(w' a ) IIX('u^) 

= p(X(w ' a ) = X a ,X(w a ) = 0 Vw a G Wa\{«4}) 
x p(X(w' b ) = x b ,X(w b ) = 0 Vw b G Wf,\{tyj}) 

= P(Xw a — x a) p{Xw b = x b) 

So X\v a HX^v But -X’(Wa) = sup.^g^y X(w a ) is a function of X\.y a , and 
X(Wf,) is a function of Hence X(W a ) Hlfifi,). 

□ 


Proof of Corollary 2: 

Since A is intrinsic to C, C a is a subgraph of C with V(Ca) C F(C). 

Let W a in Ca be the subset of V{Ca) which consists of elements of W a in C. 
Then W a is well-defined on Ca, as is X(w a ) for any w a G W a - 
X(W a ) is measurable with respect to the sigma-field of C, so it partitions the 
set of atoms of C. Since A C A(C), it also partitions the set of atoms of Ca, and 
is well-defined on Ca as: 


X(W a ) = sup X(w a ). 

W a £ W a 
w a e V (C A ) 

Hence PA(X(W a ) = x a ) = p(X(W a ) = x a \ A), and all necessary terms are 
defined on Ca consistently with their definitions on C. 

In Ca there exists a cut-vertex w such that W a -< w -< W b . so by Theorem 2, 
X(w a ) H X(w b ) holds in Ca for any w a G W a H V (Ca), w b G W b H V (Ca). 

Hence by Lemma 1, X(W a ) H X(W b ) \ A holds in C. 

□ 
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